Intro
Analysis of evacuation routing in Pakistan. We will look at flooded
populated areas in Pakistan and assess their distance on major roads
towards the next non flooded location.
Setup
This section for reproducibility. We use R in the version 4.3.2. In
order to be able to reproduce this analysis you need an openrouteservice
API key. Get one here. Put it in a text
file file called config.txt in the working directory. It
has no structure, just the key in the first line.
ZXCVBNMLKJHGFDSAQWERTYUIOP
libraries
Almost all of the libraries can be installed via CRAN with the
command install.packages('packagename'). The
openrouteservice package is not on CRAN and needs to be installed via
github. With the command
remotes::install_github("GIScience/openrouteservice-r");
Also tmap in its most recent version 4 is not yet available on CRAN so
we need to download and install it via github too.
# main libraries
library(tidyverse)
library(glue)
library(sf)
library(units)
library(ggplot2)
library(tictoc)
library(openrouteservice) # remotes::install_github("GIScience/openrouteservice-r");
library(jsonlite)
library(terra)
library(exactextractr)
library(tmap) # remotes::install_github("r-tmap/tmap")
library(leaflet)
library(furrr)
library(purrr)
library(kableExtra)
library(mapsf)
library(RColorBrewer)
api_key <- readLines("config.txt", n = 1)
# Function to adjust a list of coordinates to the location of the closest road segment via the snapping endpoint of ORS. ORS itself only allows for a maximum distance of 350m to snap.
ors_snap <- function(x, rowids, local=F) {
library(httr)
# Define the body
body <- list(
locations = x,
radius = 100000000 # apparently the maximum snapping distance is 5km
)
# Define the headers
headers <- c(
'Accept' = 'application/json, application/geo+json, application/gpx+xml, img/png; charset=utf-8',
'Authorization' = api_key,
'Content-Type' = 'application/json; charset=utf-8'
)
endpoint <- if(local){'http://localhost:8080/ors/v2/snap/driving-car'} else {'https://api.openrouteservice.org/v2/snap/driving-car'}
# Make the POST request
response <- POST( endpoint,
#'https://api.openrouteservice.org/v2/snap/driving-car',
body = body,
add_headers(headers),
encode = "json")
resp_content <- content(response)
extract_data <- function(x, index) {
if (is.null(x)) {
return(data.frame(
rowid = index,
lon = NA,
lat = NA,
snapped_distance = NA
))
} else {
return(data.frame(
rowid = index,
lon = x$location[[1]],
lat = x$location[[2]],
snapped_distance = x$snapped_distance
))
}
}
# Extract data from the list and create a dataframe
df <- do.call(rbind, lapply(seq_along(resp_content$locations), function(i) extract_data(resp_content$locations[[i]], i)))
df$rowid <- rowids
# Convert the dataframe to an sf object
sf_df <- df |>
drop_na(lon, lat) |> # Drop rows with NA coordinates
st_as_sf(coords = c("lon", "lat"), crs = 4326) |> # Define the coordinate columns and set CRS
st_sf()
return(sf_df)
}
Processing
pipeline
The workflow is as follows:
- Sample locations in pakistan
- Enrich in order to differentiate in to
- flooded and not flooded
- snapped and not snapped
- with population and without population
- Take the snapped, populated, flooded locations and find the 100
nearest non flooded, snapped locations
By snapping we refer to the process of changing the location of an
origin or destination of a route to the nearest road segment. This is
necessary because the openrouteservice API only allows for a maximum
distance of 350m to snap. However in the context of Pakistan greater
snapping distances are desired as the road density in some regions might
be low.
Flood
preprocessing
In order to join the flood data with the admin boundaries we crop and
mask it to the exact boundaries of Pakistan. We change all negative
values which could represent no data values to 0. That why we don’t ran
into aggregation problems when up cells within a grid cell or other
boundary that are less than 0, NA, Inf or NaN.
flood <- flood |> terra::crop(admin)
flood <- flood |> terra::mask(admin)
flood[flood<0] <- 0
Grid sampling
Finding the shortest/fastes way between a number of origins and
destinations is a job for the Matrix endpoint of ors. However our
endeavour is not a typical matrix request. We are not interested in all
possible connections but only for the 100 closest non flooded locations
from a flooded one. So basically for each origins one request with 100
destinations. Lets start with the grid sampling. We decide for a 5000m
width hexagonal grid. A hexagonal grid is closer to a circle and
therefore the better joice for our analysis. Within each grid cell we
will use the centroid and search for road network locations to snap
to.
# create 15km grid
grid1km <-
st_make_grid(admin |> st_transform(32642), cellsize = 5000, square = F, flat_topped = T)
# convert to sf df
grid1km <- grid1km |> st_as_sf()
names(grid1km) <- 'geometry'
st_geometry(grid1km) <- 'geometry'
# get rid of all cells that are not within somalia
grid1km <-
grid1km |> st_transform(4326) |> st_filter(admin |> st_union())
# join admin codes
grid1km <-
grid1km |> st_join(admin |> select(admin3_teh, admin3_te0, admin3_pop, admin3_po0, admin3_po1),
left = T,
largest = T)
# join pop information
grid1km$pop <- exact_extract(pop, grid1km, 'sum', progress = F)
grid1km$flood <- exact_extract(flood, grid1km, 'max', progress = F)
# add rowid
grid1km <- grid1km |> mutate(rowid = row_number())
Time for this code chunk to run: 1227.95 seconds
Snapping
The grid is created, now snap the centroids.
# convert
coord_list <-
lapply(grid1km$geometry |> st_centroid(), function(x)
c(st_coordinates(x)[, 1], st_coordinates(x)[, 2]))
# snap the list of cords
snapped_coords <- ors_snap(coord_list,rowids=grid1km$rowid, local = T)
# join back
grid1km_snapped <- grid1km |>
left_join(snapped_coords |> tibble(), by = "rowid") |>
mutate(centroid = st_centroid(geometry.x))
# refactor names in order to keep, the grid geometry, centorid and snapped centroid
names(grid1km_snapped)[10:11] <- c('geometry', 'snapped_centroid')
st_geometry(grid1km_snapped) <- 'geometry'
grid1km_snapped <- grid1km_snapped |> mutate(snapped = ifelse(is.na(snapped_distance), F, T))
Time for this code chunk to run: 27.34 seconds
Overview grid &
base data
tmap_mode('plot')
grid1km_snapped |> tm_shape() +
tm_polygons(fill='snapped', lwd=.1, fill.legend = tm_legend(title='Snapped centroid')) + tm_place_legends_inside()

grid1km_snapped |> mutate(pop=ifelse(pop<=0, F, T)) |>
tm_shape() +
tm_polygons(fill='pop', lwd=.1, fill.legend = tm_legend(title='Inhabited')) + tm_place_legends_inside()

grid1km_snapped |> mutate(flood=ifelse(flood<=0, F, T)) |>
tm_shape() +
tm_polygons(fill='flood', lwd=.1, fill.legend = tm_legend(title='Flood affected')) + tm_place_legends_inside()

Out of the 41539 cells covering Pakistan 17735 are inhabited. 10412
are affected by floods.
Differentiate in
origins and destinations
How many of the cells we are interested in are flooded
# source points are only grids that are affected by flooding, contain population and are snapped
source_pts <- grid1km_snapped |> filter(flood>0 & pop>0 & !is.na(snapped_distance))
# destination points are all grid cells that are not affected by flooding and snap
destination_pts <- grid1km_snapped |> filter(!is.na(snapped_distance) & (flood<= 0 | is.na(flood)))
#grid1km_snapped |> tm_shape() + tm_polygons(fill='location_type', lwd=.1)
grid1km_snapped |> st_drop_geometry() |> mutate(flooded = ifelse(flood <=0 | is.na(flood), T, F)) |>
group_by(snapped, flooded) |> summarise(pop=sum(pop)) |> kable(latex_options='striped') |> suppressMessages()
| FALSE |
FALSE |
1918885 |
| FALSE |
TRUE |
2704146 |
| TRUE |
FALSE |
117942897 |
| TRUE |
TRUE |
98325684 |
How many source points do we miss due to not
snapping?
603
How many destination points?
17664
Matrix request
Now we fire for each origin location a request to the ors matrix
endpoint to get the 100 nearest destinations. We iterate over all 7026
source points. This process is paralellized via the furrr package to
speed up the process.
nearest_threshold <- 100
# Set up parallel backend using furrr
plan(multisession, workers = 8) # Use 8 cores for parallel processing
# Function to process each row
process_row <- function(i) {
# Select single grid cell
grid1km_subset <- source_pts[i, ]
destination_pts <- destination_pts |>
mutate(dist = st_distance(st_centroid(destination_pts),
st_centroid(grid1km_subset)))
# Select the top 10 rows with the lowest distance
next_10 <- destination_pts |>
arrange(dist) |>
slice_head(n = nearest_threshold)
#next_10 |> mapview::mapview(next_10)
#next_10$geometry |> plot()
#grid1km_subset$geometry |> plot(add=T, col='red')
# next_10 <- grid1km_subset |>
# mutate(dist = st_distance(st_centroid(grid1km_subset), st_centroid(grid1km_notflooded_snapped_notNA))) |>
# group_by(admin3_teh) |>
# slice_min(dist, n = 10)
coord_list <- map(next_10$snapped_centroid |> st_centroid(), ~ c(st_coordinates(.x)[, 1], st_coordinates(.x)[, 2]))
source_coords <- map(grid1km_subset$snapped_centroid |> st_centroid(), ~ c(st_coordinates(.x)[, 1], st_coordinates(.x)[, 2]))
coord_list_run <- c(source_coords, coord_list)
res1 <- ors_matrix(
coord_list_run,
sources = 0,
metrics = c("duration", "distance"),
units = "km",
api_key = api_key,
output = 'parsed'
)
source_id <- grid1km_subset$rowid
destination_ids <- next_10$rowid
bird_dists <- next_10$dist
result <- data.frame(
source_id = source_id,
destination_ids = I(list(next_10$rowid)),
durations = (res1$durations |> as.vector())[-1] |> as.numeric() |> list() |> I(),
distances = (res1$distances |> as.vector())[-1] |> as.numeric() |> list() |> I(),
bird_dist = bird_dists |> as.numeric() |> list() |> I(),
min_dist = min((res1$distances |> as.vector())[-1] |> as.numeric(), na.rm = TRUE),
max_dist = max((res1$distances |> as.vector())[-1] |> as.numeric(), na.rm = TRUE),
mean_dist = mean((res1$distances |> as.vector())[-1] |> as.numeric(), na.rm = TRUE),
min_dur = min((res1$durations |> as.vector())[-1] |> as.numeric(), na.rm = TRUE),
max_dur = max((res1$durations |> as.vector())[-1] |> as.numeric(), na.rm = TRUE),
mean_dur = mean((res1$durations |> as.vector())[-1] |> as.numeric(), na.rm = TRUE),
na = (res1$durations |> as.vector())[-1] |> as.numeric() |> is.na() |> sum()
) |> tibble()
return(result)
}
# Run the parallel processing using furrr
results <- future_map_dfr(1:nrow(source_pts), process_row)
# Combine the results into the final result matrix
result_matrix <- results
# Optionally, revert to sequential plan after the parallel work is done
plan(sequential)
#mapview::mapview(grid1km_subset, cex='red') + mapview::mapview(next_10)
Time for this code chunk to run: 954.71 seconds
From the functions above we get for every origin: * the source id *
all ids of the 100 destinations * the travel time duration from source
to all destinations * the travel distance * direct distance * minimum
travel distance * maximum travel distance * mean travel distance *
minimum travel duration * maximum travel duration * mean travel duration
* Amount of NAs - Not available routes from source to all
destinations
Next we join the matrix result back to our grid. And do some
postprocessing.
# join, convert time to hours
grid1km_snapped_joined <- grid1km_snapped |> left_join(result_matrix, by = c('rowid' = 'source_id'))
grid1km_snapped_joined <- grid1km_snapped_joined |> mutate(
min_dist=ifelse(is.infinite(min_dist), NA, min_dist),
max_dist=ifelse(is.infinite(max_dist), NA, max_dist),
mean_dist=ifelse(is.infinite(mean_dist), NA, mean_dist),
min_dur=ifelse(is.infinite(min_dur), NA, min_dur),
max_dur=ifelse(is.infinite(max_dur), NA, max_dur),
mean_dur=ifelse(is.infinite(mean_dur), NA, mean_dur)
) |> mutate(
min_dur = min_dur / 3600,
mean_dur = mean_dur / 3600,
max_dur = max_dur/3600
#durations = durations |> map(~ .x/3600),
)
Results
visualization
Main of minimum
travel distance
Map of the minimum travel distance in km from each flooded &
inhabited location to the next closest non flooded location.
tmap_mode('view')
tm_shape(grid1km_snapped_joined) +
tm_polygons(fill='min_dist',
lwd=.1,
fill.legend=tm_legend(title='Travel distance in km'))